CPS222 Lecture: Linked Lists                           Last revised 1/31/15

Objectives:

1. To show how to implement linked lists in C++
2. To introduce linked list variants: circular lists, doubly-linked lists,
   use of a header node.

Materials:

1. Handout of StudentList interface and simple list implementation, plus
   executable code.
2. Online versions of various student list implementations to project:
   (w/header) studentlisth.cc    (circular w/header) studentlisthc.cc
   (doubly linked) studentlistd.cc
3. Handouts developing differences between each version and preceeding one
4. Handout comparing C++ list template with Java List collection

I. Introduction
-  ------------

   A. We will now look at how to implement linked lists.  We will use C++
      for our examples, but the strategy is easily adapted to any language
      that supports pointers (including Java, whose references really
      behave like C++ pointers).

   B. One issue we face in implementing linked lists is that the general
      concept is very generic.  In particular, whenever we use a list we
      typically utilize some specific ordering rule to control were new
      elements are added to the list.  (E.g. in a first-come first-served
      situation, new elements are always added at the end.)

      1. To make our examples concrete - and because this particular example
         allows us to introduce a variety of concepts, we will consider the
         case of a list of students, maintained in "priority" order by
         academic year - i.e. All seniors will appear before all juniors, who 
         will appear before all sophomores ...

         a. A newly added student will appear after all students of the same
            or higher class, but before all students of lower class - e.g.
            a newly added junior will come after all seniors and juniors, but
            before all sophomores.

         b. We will restrict ourselves to accessing just the first student
            in the list, and will only allow ourselves to remove the first
            student in the list.  (Maybe this structure is modelling some
            sort of waiting line.)

         c. The elements in our list - our nodes - will contain:

            i. A person's name

           ii. A class year (4 = senior, 3 = junior ...)

          iii. A link that points to the next item in the list.  (Often known
               as next)

               In addition, we will use a single "external pointer" to point to
               the first item in the list.

            EXAMPLE: Using these conventions, suppose we start with an 
            initially empty list and insert items into it in the order:

                info            year

                aardvark        3
                buffalo         2
                cat             4
                dog             3

            Then the list would look like this:

                __________      ____________    __________      __________
                | cat    |      | aardvark |    | dog    |      | buffalo|
                | 4      |      | 3        |    | 3      |      | 2      |
        o------>|      o-|----->|        o-|--->|      o-|----->|      o-|---
                ----------      ------------    ----------      ----------  |
                                                                          -----
                                                                           ---
           iv. Note:

               -  A single external pointer (not part of any node on the list)
                  points to the first node.

               - Access to any node is by following a chain of one or more
                 pointers.  Thus, for example, to get to "dog", one would have
                 to follow the external pointer to "cat", then cat's link to
                 "aardvark", then aardvark's to dog.

               - The last node in the list contains a special pointer value that
                 indicates that there are no more nodes - a null pointer.

         d. A comment is in order on the value used to mark the end of the
            list:

            i. In Java, there is a reserved word "null" which is a reference
               to nothing, and can be used for this purpose.

           ii. In C++, a pointer with a value of 0 serves the same purpose.
               Conventionally, we refer to such a value by the name NULL
               (all caps) - but this is not a reserved word in C++, as it is
               in Java.  Instead, some of the standard system header files
               define NULL appropriately, using #define.  If you use NULL
               in a program and get complaints about NULL being undefined,
               you can include the following line (but don't do it unless
               you need to)

                #define NULL 0

      2. We will define the following operations on this structure:

         a. constructor

         b. bool isEmpty() - accessor - test to see whether list is empty

         c. void makeEmpty() - mutator - clear out contents

         d. void insert(string name, int year) - mutator

         e. string getFirst() - accessor - returns name of first student

         f. void removeFirst() - mutator - removes first student

         g. void remove(string name) - mutator - removes a specific student

         h. In addition, for demonstration/testing purposes, we will
            include an accessor called print to print out all the nodes
            on the list in order to cout.

         i. Proper management of memory will also require us to write some
            additional methods, which we will discuss after considering the
            major methods listed above.

            i. A destructor to delete all the nodes when the list itself is
               destroyed - lest we have a "storage leak".

           ii. A copy constructor - to ensure that we make copies of all the
               individual nodes when we copy the list

          iii. An assignment operator - for similar reasons

II. Implementing Basic Linked Lists in C++
--  ------------ ----- ------ ----- -- ---   

   A. We can now develop the code needed for our example list

   B. studentlist.h 

      HANDOUT page 1

      1. Walk through the prototypes - note that we will discuss the copy 
         constructor, operator =, and destructor later.

      2. Note INCOMPLETE declaration of local class Node (needed to make
         _first field declaration possible.)  

      3. _first is our EXTERNAL POINTER to the list.  Note that the first
         node on the list is special - it is pointed to by the external
         pointer, whereas all the other nodes are pointed to by the
         preceeding node.

   C. studentlist.cc

      HANDOUT pages 2-4

      Walk through the local class and methods, skipping the last three

      1. Local class Node.

         a. Note how its methods are declared and implemented in the same
            place - appropriate since the details of the class are only
            needed here.

         b. The methods of class StudentList need to manipulate the fields 
            of class Node.  Note how this is made possible by declaring its 
            fields to be be public, since it is private in class StudentList.
            An alternate approach would be to make the fields of Node 
            private, and include the following in class Node:

            friend class StudentList;

         c. Note code to report when a node is being destroyed - just for
            our demonstration purposes.

      2. Constructor - makes external pointer NULL

         Time complexity of this operation?

         ASK

         O(1)

      3. isEmpty() - tests to see whether external pointer is NULL

         Time complexity of this operation?

         ASK

         O(1)

      4. makeEmpty() - walks through the list, deleting nodes.  Note
         need for two variables d and p - cannot access the link of
         a node (reliably) after it has been deleted.

         i. Trace for a two node list

        ii. Time complexity of this operation?

            ASK

            O(n)

      5. insert(name, year)

         a. Basic process for insertion into ANY linked structure is

               Get a node
               Load it up
               Link it in

         b. new Node(name, year) does first two

         c. Before linking it in, we need to determine where.  We can't
            know this until we find the node that belongs AFTER it
            (one with class year less than the new student.)  We need
            to know the node that needs to go BEFORE it in order to
            link it in correctly.

         d. Note two conditions on loop - we stop if we run off the
            end of the list (in which case the new node goes at
            the very end) or we find a node with year < new student.

         e. We call the technique of using two pointers like this
            using a LEADING pointer and a TRAILING pointer.

         f. Linking code - the new node must point to its successor (or
            NULL if we ran off the end of the list.), and its predecessor 
            must point to it.  q == NULL implies that the new node goes at 
            the start of the list, so we reset the external pointer instead.

         g. Trace process of building up a list, from empty, as follows:

             aardvark 3
             buffalo 2
             cat 4
             dog 3

         h. Time complexity of this operation?

            ASK

            O(n)

      6. getFirst()

         a. What will happen if _first == NULL (empty list)?

            ASK

            Note precondition in .h file

         b. Time complexity of this operation?

            ASK

            O(1)

      7. removeFirst()

         a. Will also crash if _first == NULL

         b. Note that we need to explicitly recycle the node by using
            delete - else it is lost for the remainder of the run of
            the program!

         c. Time complexity of this operation?

            ASK

            O(1)

      8. remove(name)

         a. First we search for the node.  We need to have a pointer
            to the node BEFORE it as well, since we must modify the
            pointer in that node.  Again, we use leading and trailing
            pointers.

         b. The while loop terminates when we either find the node or
            run off the end of the list.

         c. To unlink the node from the list, we reset the pointer of
            the node BEFORE it - or the external pointer if it is the
            first node.

         d. Walk through deleting dog, then cat from example used for
            insert.

         e. Note, again, that we need to explicitly recycle the node by using
            delete - else it is lost for the remainder of the run of
            the program!

         f. Time complexity of this operation?

            ASK

            O(n)

      9. print()

         a. Involves a loop in which we traverse the list - visiting
            every node.

         b. Time complexity of this operation?

            ASK

            O(n)

   D. Demonstrate class linked with test driver.

   E. The code for studentlist includes three methods that are necessary
      because of the way C++ does storage allocation.

      1. We begin with the destructor - the last operation listed.

         a. In C++, a class may have an explicit destructor.  The signature
            of the destructor is always ~ClassName - with no return value
            and no parameters.  

            If the programmer does not supply a destructor, the compiler
            creates a "Miranda rule" destructor that does nothing.

         b. The destructor for an object is called in one of the following ways.

            i. If the object is declared as an ordinary (non pointer, non
               reference variable), then the destructor is automatically
               called when the variable goes out of scope.

               - Termination of the program for a static variable.

               - Exit from a block for an automatic variable.

               - When it is no longer needed for a temporary variable
                 created by the compiler.

           ii. If the object is declared as a heap variable accessed by a
               pointer, the programmer must invoke the delete operation on
               the pointer.

          iii. It is also possible to call a destructor explicitly (like
               other methods), but there is a special syntax used, and this
               is rarely needed.

         c. A programmer-written destructor is needed when the object
            "owns" resources that must be freed when it is destroyed.
            Here, we require that all the nodes on the list be freed up
            when the external pointer to the list is destroyed, lest
            the storage allocated to them be lost for the duration of the program.

            Go over destructor in program.

            Here, the destructor uses the makeEmpty() method to delete all
            the nodes in the list.

      2. When a class has a destructor, it often needs two other methods
         to prevent resources from being released prematurely.

         a. To see why, consider the following scenario, based on our
            StudentList class (but for now without the copy constructor
            and assignment operator).

            StudentList s;
            s.insert("aardvark", 3);
            s.insert("buffalo", 2);
            foo(s);
            bar();

            where foo() looks like this:

            void foo(StudentList x)
            {
              ...
            }

            and bar() looks like this:

            void bar()
            {
              StudentList y;
              y = s;
            }

            i. Note that the list is passed to foo by value.  When we enter foo,
               we have the following scenario, because x is initialized to be
               a copy of s.  

                        ------------       ------------       ------------
            Top level   | _first o-|-----> | aardvark |   --> | buffalo  |
            variable s  ------------   --> | 3        |  /    | 2        |
                                      /    |    o-----|--     | NULL     |
                        ------------ /     ------------       ------------
            Parameter x | _first o-|-
                        ------------

               That is, both s and x refer to the same list of nodes.  We
               say that x is a SHALLOW COPY of s.

           ii. Now what happens when foo exits?  The destructor for x is
               called, since x is local to foo.  This results in the nodes
               on the list being recycled - destroying the list pointed to
               by s!  (Actually, the _first pointer in s refers to a recycled 
               node, which can lead to almost anything going wrong.)

          iii. A similar situation happens in bar.  The assignment statement
               causes a y to become a shallow copy of s.  When bar exits, y
               is destroyed and the nodes on the list are recycled - again!

         b. To prevent these problems from arising, we must ensure that
            whenever a list is copied, we make a DEEP COPY that copies all
            the nodes, not just the external pointer.  That requires us
            to implement two methods.

            i. A copy constructor - constructor with signature

                 StudentList(const StudentList &)

                 - The compiler uses this whenever it must copy an object -
                   e.g. when a parameter is passed by value, or a function
                   result is returned by value.

                 - If the programmer doesn't write one, the compiler creates
                   a "Miranda rule" copy constructor that simply makes a bit
                   by bit copy of the object (i.e. a shallow copy if the object
                   contains any pointers.)

           ii. Overload of the assignment operator - method with signature
                
                 StudentList & operator = (const StudentList &)

                 - The compiler uses this whenever assignment is done using =.
                 
                 - If the programmer doesn't write one, the compiler creates
                   a "Miranda rule" assignment operator that simply makes a 
                   bit by bit copy of the object (i.e. a shallow copy if the
                   object contains any pointers.)

         c. Go over code for copy constructor and assignment operator in
            example.

            i. Note how copy constructor uses assignment operator, to avoid
               having to write code twice.

           ii. Note return from assignment operator - needed to allow
               chaining of assignments:

                a = b = c;

III. Variants of the basic linked list
---  -------- -- --- ----- ------ ----

   A. Use of a header node.

      1. In algorithms involving linked lists, there are certain crucial points 
         one must bear in mind:

         a. To insert an item into a linked structure, it is necessary to modify
            the link field of its predecessor.  This means that to insert an
            item, we must traverse the list from its beginning until we reach
            the place we want.  Often this is done with two pointers - a leading
            pointer and a trailing pointer.  When the leading pointer hits the
            item that is to be the SUCCESSOR of our new item, the trailing 
            pointer is on its predecessor.  We did this above.

         b. An exception to the above rule occurs when the item we are inserting
            is the first in the list.  Then we modify the EXTERNAL POINTER to 
            the structure, since the item has no predecessor.  This was 
            included as a special case in our insert method above.

         c. A similar principle holds with deleting an item from a linked list.
            We must modify the pointer field in its predecessor to point to its
            successor or, if it is the first item in the list, we must modify
            the external pointer.  Again, this was included as a special case in
            our remove(name) method above.  (It was the ONLY case in our
            removeFirst() method.)

      2. These points imply that many list algorithms will have the following
         structure:

         Traverse the list, using leading and trailing pointers, until you
          have found the proper place - being sure not to run off the end
          of the list.
         if leading pointer is on the first item in the list then
          modify the external pointer to the list to effect the structural
          change (i.e. to point to the newly inserted node or to jump around
          the deleted node.)
         else
          modify the link field of the node pointed to by the trailing pointer
          to effect the structural change,

      3. The fact that modifications at the front of the list are a special case
         leads to problems in general purpose algorithms.  Often, it is 
         desirable to eliminate this special case.  One method for doing so is 
         by the use of a HEADER NODE:

         a. When an "empty" list is created, it is actually created to contain
            one special node called a header.  This node is not functionally a
            part of the list as far as a user of the list is concerned; but it
            simplifies the algorithms since the first useful item on the list
            is actually the successor of the header and thus requires no
            special cases when accessing.  (All nodes that are officially part
            of the list have a predecessor that's part of the list.)

         b. Example: redraw list containing cat 4, aardvark 3, dog 3, 
            buffalo 2 with a header.

         c. Note that we don't actually make any use of the values stored
            in this node (_name and _year) - just its link.  (We'll discuss
            a way to make use of one of the values later.)

      4. Let's consider how our code would be modified to use a header.

         a. Project studentlisth.cc - down to first #define.

            i. In the list without a header, the external pointer went to
               the first node.  Here it goes to the header, which in turn
               goes to the first node.

           ii. For clarity, we will change the name of the private field
               from _first to _header.  Rather than creating a new .h file,
               the code here uses the preprocessor to do the job for us!

         b. Any changes needed to Node class? 

            ASK

            NO

         c. Any changes needed to constructor?

            ASK

            Discuss code projected versus handout

         d. Any changes needed to isEmpty()?

            ASK

            Discuss code projected versus handout

         e. Any changes needed to makeEmpty()?

            ASK

            Discuss code  projected versus handout - note that we don't 
            recycle the header, just the nodes containing data

         f. Any changes needed to insert()?

            ASK

            Discuss code projected versus handout

         g. Any changes needed to getFirst()?

            ASK

            Discuss code projected versus handout

         h. Any changes needed to removeFirst()?

            ASK

            Discuss code projected versus handout

         i. Any changes needed to print()?

            ASK

            Discuss code projected versus handout

         j. Any changes needed to copy constructor?

            ASK

            Discuss code projected versus handout

         k. Any changes needed to operator = ?

            ASK

            Discuss code projected versus handout

         l. Any changes needed to destructor?

            ASK

            Discuss code projected versus handout

         HANDOUT with changes

   B. Another special case occurs when we reach the end of a linked list.
      The end is normally marked by having the last item in the list have a
      link value that does not represent a legal pointer to a node - i.e. NULL.

      1. We must be terribly careful we do not use this as if it were a legal 
         pointer - e.g. the while loops in insert and remove test for
         p != NULL before they examine p -> _year (insert) or p -> _name
         (remove).  C++ guarantees that if two conditions are connected by &&,
         and the first proves to be false, the second won't even be tested.
         (False and anything is false.).

      2. One way to avoid having to be careful to always check for this
         special case would be with the use of a TRAILER node.

         a. The trailer node would contain a year LESS than any possible
            value - e.g. 0. The while loop would then be:

                while (p -> _year >= year)

            and this would be guaranteed to exit when p is on the trailer.

         b. The initial condition of an empty list would be two nodes: an
            external pointer pointing to a header pointing to a trailer.

         c. However, this approach as such is not often used, because of the
            need to create two special nodes just to make an empty list.

      3. An alternate approach to achieving nearly the same effect is the use
         of CIRCULAR LINKING.

         a. In this approach the last actual node, instead of containing a
            NULL pointer, points back to the first node on the list.
            If a header is used (as it often is with circular linking), then
            the "first node" is the header node, of course.  Thus, a circular
            list with a header looks like this:

            Empty list:     +--> [ Header ]--+  -- header points to itself
                            |                |
                            +----------------+

            List w/2 real   +--> [ Header ]--> [  ]--> [  ]--+
            elements:       |                                |
                            +--------------------------------+

         b. Recall that, in the example we have developed thus far, we didn't
            make any use of the _name or _year fields of the header.  Now
            what we will do is store in the header a year that is SMALLER
            than any possible year - e.g. 0  (If the list were in
            increasing order, we would store a larger value than any possible
            legal value, of course.)  Thus, in effect, the same node serves
            as BOTH a header and a trailer.

      4. Let's consider how our code would be modified to use circular
         linking along with a header.

         a. Project studentlisthc.cc - down to just before class Node.  We
            will now consider changes in this code relative to the version
            with a header we just  considered - i.e. cumulatively from our
            original code.  (Circular lists don't always have headers, but
            it often makes sense to do so, as in this case.)

         b. Any changes needed to Node class? 

            ASK

            NO

         c. Any changes needed to constructor?

            ASK

            Discuss code projected versus handout

         d. Any changes needed to isEmpty()?

            ASK

            Discuss code projected versus handout

         e. Any changes needed to makeEmpty()?

            ASK

            Discuss code  projected versus handout - note that we don't 
            recycle the header, just the nodes containing data

         f. Any changes needed to insert()?

            ASK

            Discuss code projected versus handout

         g. Any changes needed to getFirst()?

            ASK

            Discuss code projected versus handout

         h. Any changes needed to removeFirst()?

            ASK

            Discuss code projected versus handout

         i. Any changes needed to print()?

            ASK

            Discuss code projected versus handout

         j. Any changes needed to copy constructor?

            ASK

            Discuss code projected versus handout

         k. Any changes needed to operator = ?

            ASK

            Discuss code projected versus handout

         l. Any changes needed to destructor?

            ASK

            Discuss code projected versus handout

         HANDOUT with changes

   C. One more modification: Double Linking

      1. Recall that linked lists, as we have developed them thus far, are
         like one way streets.

         a. You can get from a node to its successor by following one link.

         b. The only way to get to the predecessor of a node is to start at
            the beginning of the list, using leading and trailing pointers,
            until the leading pointer hits the node you want.

      2. Sometimes, it is desired to be able to go both forward AND backward
         from a given node in a list.  If this is the case, we can use
         a DOUBLY-LINKED list, in which each node contains two pointers:
         one to its successor and one to its predecessor.

         [  ] --> [  ] --> [  ]
         [  ] <-- [  ] <-- [  ]

         a. We will refer to these pointers as the forward link and the
            backward link - or next and prev for short.

         b. Doubly-linked lists need not use a header node, but often do
            (because otherwise the special cases become a real problem.)
            If the list does have a header, then:

            i. The next of the header points to the first real item.

           ii. The prev of the first real item points to the header.

          iii. The prev of the header MAY be used to point to the last item,
               if this is useful (and it often is.)

         c. Indeed, when a header is used, it is often expedient to also use
            circular linking.

            i. Having the prev of the header point to the last item is part
               of this.

           ii. Likewise, the next of the last item would point to the header.

          iii. QUESTION: What would an empty doubly-linked circular list with
               a header look like?

               -- a doubly-narcissistic header node!

         d. Redraw original example doubly-linked circular with header.

      3. Let's consider how our code would be modified to use double linking,
         along with circular linking and a header.

         a. Project studentlistd.cc - down to just before class Node.  We
            will now consider changes in this code relative to the circular 
            version with a header we just  considered - i.e. continuing to be
            cumulative from our original code.  

         b. Any changes needed to Node class? 

            ASK

            NO

         c. Any changes needed to constructor?

            ASK

            Discuss code projected versus handout

         d. Any changes needed to isEmpty()?

            ASK

            Discuss code projected versus handout

         e. Any changes needed to makeEmpty()?

            ASK

            Discuss code  projected versus handout - note that we don't 
            recycle the header, just the nodes containing data

         f. Any changes needed to insert()?

            ASK

            Discuss code projected versus handout

         g. Any changes needed to getFirst()?

            ASK

            Discuss code projected versus handout

         h. Any changes needed to removeFirst()?

            ASK

            Discuss code projected versus handout

         i. Any changes needed to print()?

            ASK

            Discuss code projected versus handout

         j. Any changes needed to copy constructor?

            ASK

            Discuss code projected versus handout

         k. Any changes needed to operator = ?

            ASK

            Discuss code projected versus handout

         l. Any changes needed to destructor?

            ASK

            Discuss code projected versus handout

         HANDOUT with changes

IV. Lists in the C++ Standard Template Library (STL)
--  ----- -- --- --- -------- -------- ------- -----

   A. The Standard Template Library (STL) includes a list template, which
      actually uses a linked list, but doesn't require the user to deal
      directly with pointers.

   B. HANDOUT comparing the C++ list template and the Java List collection

      Some points to note:

      1. The list template is made available by

         #include <list>

      2. The template is instantiated for a particular type of list element
         
         a. This is accomplished by

            typedef list < Student * > WaitingList (near bottom of page 1)

         b. A field of this type is then created by

            WaitingList waiting;

         c. These two declarations could have been combined into one declaration

            list < Student * > waiting;

            - However, this would complicate the syntax needed in the
            implementation file, and use of typedef is normally preferred.

         d. We use Student * as the type for the list elements, rather than
            Student, because we only want one object per student, perhaps
            with multiple references to it.  (Recall that the C++ pointer is
            closest in meaning to the Java reference.)

      3. The Java list interface has two distinct implementations: ArrayList
         and LinkedList.  The C++ list template is always implemented by
         a linked list - a doubly linked one, at that!

      4. The C++ list template does NOT include a method for checking to see
         if a particular element occurs in the list (analogous to the Java
         List contains() method).  In the C++ version of isWaiting(), then, we
         have to do things the hard way by going through the list one element
         at a time and checking for a match.  We can return true as soon as
         we find a match.  We return false if we complete the loop without
         finding a match.

      5. Both the C++ STL and the Java collections facility support iterators
         for systematically visiting the elements of a collection - but the
         syntax is quite different.

         a. For each instantiation of a C++ STL container template, there is
            are corresponding iterator types.  In this example, when we
            instantiated list for Student * using the typedef name WaitingList,
            we also automatically created four types of iterators:

            WaitingList::iterator
            WaitingList::const_iterator
            WaitingList::reverse_iterator
            WaitingList::const_reverse_iterator

            i. The const forms of the iterator do not permit modification of
               the elements of the container through them - and must be used
               if we create an iterator in a const method.

           ii. The reverse forms of the iterator go through the container
               backwards!  (Not all container types support reverse iteration,
               but list does.)

         b. A C++ container has at least two methods that create iterators -
            begin() returns an iterator that references the first element in
            the collection, and end() refers one past the end of the collection.
            (Containers that support reverse iterators, as list does, also 
            have methods rbegin() and rend().)

         c. An iterator supports the following operations:

            i. == - compare two iterators.  Two iterators are equal just when
               they refer to the same list element.  Of course, != is defined
               as ! ( == ).  

               The comparsion iter != end() has similar meaning to the Java 
               iterator method hasNext();

           ii. * - dereference the iterator to get at the element it
               currently refers to - similar to one of the functions of
               Java's next() method.

          iii. ++ - advance the iterator to the next element - similar
               to the other function of Java's next() method.

         d. Note carefully the C++ code that uses iterators in isWaiting()
            and printReport().  This code is a "C++ idiom".